88 results found.
Speech/Written
Corpus,
Language Type:
Multilingual
Languages:
Dutch English French German Italian Polish Portuguese Spanish
Availability:
Freely Available
License:
CC BY 4.0
Size:
None Production Status:
Existing-used
Use:
Information Extraction, Information Retrieval
-
Paper title:LeBenchmark: A Reproducible Framework for Assessing Self-Supervised Representation Learning from Speech
-
Paper track:8.1 Feature extraction and low-level feature model/Oral Presentation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Laurent Besacier | Multilingual LibriSpeech (MLS) | /N |
Documentation:
https://arxiv.org/abs/2012.03411, English, public
Speech/Written
Corpus,
Language Type:
Multilingual
Languages:
Catalan Chinese English Esperanto French German Italian Kabyle Kinyarwanda Persian Polish Russian Spanish Welsh
Availability:
Freely Available
License:
Creative Commons license
Size:
8.8k hoursProduction Status:
Existing-used
Use:
Speech Recognition/Understanding
-
Paper title:LeBenchmark: A Reproducible Framework for Assessing Self-Supervised Representation Learning from Speech
-
Paper track:8.1 Feature extraction and low-level feature model/Oral Presentation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Laurent Besacier | Common Voice | /N |
Documentation:
https://arxiv.org/pdf/1912.06670.pdf, English, public
Written
Ontology,
Language Type:
Multilingual
Languages:
Dutch English French Italian
Availability:
Freely Available
License:
Size:
285 KByte Production Status:
Existing-used
Use:
Information Extraction, Information Retrieval
-
Paper title:Every Child Should Have Parents: A Taxonomy Refinement Algorithm Based on Hyperbolic Term Embeddings
-
Paper track:Short/Textual Inference and Other Areas of Semantics
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Rami Aly | SemEval-2016 Task 13 | /N |
Documentation:
None
Written
Corpus,
Language Type:
Multilingual
Languages:
English French Italian
Availability:
Freely Available
License:
For research purposes
Size:
15 thousand pairs OtherProduction Status:
Newly created-in progress
Use:
Dialogue
-
Paper title:CONAN - COunter NArratives through Nichesourcing: a Multilingual Dataset of Responses to Fight Online Hate Speech
-
Paper track:Long/Resources and Evaluation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Yi-Ling Chung | CONAN | /N |
Documentation:
None
Written
Lexicon,
Language Type:
Multilingual
Languages:
Chinese French German Italian Japanese Spanish Thai Vietnamese
Availability:
Freely Available
License:
Creative Commons Attribution-Share-Alike License 3.0
Size:
22.5 GByte Production Status:
Existing-used
Use:
Machine Translation, SpeechToSpeech Translation
-
Paper title:A Simple and Effective Approach to Robust Unsupervised Bilingual Dictionary Induction
-
Paper track:Long paper/
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Yanyang Li | Wiki word vectors | /N |
Documentation:
There is publicly available documentation in English.
Written
Corpus,
Language Type:
Multilingual
Languages:
Dutch English French German Italian
Availability:
Freely Available
License:
Size:
4.2 MByte Production Status:
Existing-updated
Use:
Document Classification, Text categorisation
-
Paper title:``You Sound Just Like Your Father'' Commercial Machine Translation Systems Include Stylistic Biases
-
Paper track:Short/Machine Translation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Dirk Hovy | Review Translations | /N |
Documentation:
None
Written
Corpus,
Language Type:
Multilingual
Languages:
Afrikaans Albanian Amharic Arabic Aragonese Armenian Assamese Azerbaijani Basque Belarusian Bengali Bosnian Breton Bulgarian Burmese Catalan Central Khmer Chinese Croatian Czech Danish Dutch Dzongkha English Esperanto Estonian Finnish French Gaelic Galician Georgian German Greek Gujarati Hausa Hebrew Hindi Hungarian Icelandic Igbo Indonesian Irish Italian Japanese Kannada Kazakh Kinyarwanda Korean Kurdish Kyrgyz Latvian Limburgan Lithuanian Macedonian Malagasy Malay Malayalam Maltese Marathi Mongolian Nepali Northern Sami Norwegian Norwegian Bokmål Norwegian Nynorsk Occitan Oriya Panjabi Pashto Persian Polish Portuguese Romanian Russian Serbian Serbo-Croatian Sinhala Slovak Slovenian Spanish Swedish Tajik Tamil Tatar Telugu Thai Turkish Turkmen Uighur Ukrainian Urdu Uzbek Vietnamese Walloon Welsh Western Frisian Xhosa Yiddish Yoruba Zulu
Availability:
Freely Available
License:
Size:
55 million sentences Production Status:
Existing-used
Use:
Machine Translation, SpeechToSpeech Translation
-
Paper title:Improving Massively Multilingual Neural Machine Translation and Zero-Shot Translation
-
Paper track:Long/Machine Translation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Biao Zhang | the open parallel corpus (OPUS) | /N |
Documentation:
None
Not Applicable
Contextualsed word embeddings,
Language Type:
Monolingual
Languages:
Ancient Arabic Basque Bokmål Bulgarian Catalan Chinese Church Croatian Czech Danish Dutch English Estonian Finnish French Galician German Greek Hebrew Hindi Hungarian Indonesian Irish Italian Japanese Korean Latin Latvian Norwegian Nynorsk Old Persian Polish Portuguese Romanian Russian Simplified Chinese Slavonic Slovak Slovene Spanish Swedish Turkish Ukrainian Urdu Uyghur Vietnamese
Availability:
Freely Available
License:
none
Size:
18.4 GByte Production Status:
Existing-used
Use:
Parsing and Tagging
-
Paper title:Treebank Embedding Vectors for Out-of-domain Dependency Parsing
-
Paper track:Short/Syntax: Tagging, Chunking and Parsing
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Joachim Wagner | Elmo For Many Languages | /N |
Documentation:
https://www.aclweb.org/anthology/K18-2005/
Written
Treebank,
Language Type:
Monolingual
Languages:
Dutch German Italian Norwegian Portuguese
Availability:
Freely Available
License:
CreativeCommons
Size:
None Production Status:
Use:
Parsing and Tagging
-
Paper title:Extracting Headless MWEs from Dependency Parse Trees: Parsing, Tagging, and Joint Modeling Approaches
-
Paper track:Long/Syntax: Tagging, Chunking and Parsing
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Tianze Shi | Universal Dependencies 2.2 | /N |
Documentation:
None
Written
Corpus,
Language Type:
Multilingual
Languages:
Dutch English German Italian
Availability:
License:
Size:
1.9 GByte Production Status:
Existing-updated
Use:
semantics
-
Paper title:Dscorer: A Fast Evaluation Metric for Discourse Representation Structure Parsing
-
Paper track:Short/Resources and Evaluation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Jiangming Liu | Parallel Meaning Bank | /N |
Documentation:
None




